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Abstract. The Flexible Image Transport System (FITS) standard has been a great 
boon to astronomy, allowing observatories, scientists and the public to exchange astro¬ 
nomical information easily. The FITS standard is, however, showing its age. Developed 
in the late 1970s the FITS authors made a number of implementation choices for the 
format that, while common at the time, are now seen to limit its utility with modem 
data. The authors of the FITS standard could not appreciate the challenges which we 
would be facing today in astronomical computing. Difficulties we now face include, but 
are not limited to, having to address the need to handle an expanded range of specialized 
data product types (data models), being more conducive to the networked exchange and 
storage of data, handling very large datasets and the need to capture significantly more 
complex metadata and data relationships. 

There are members of the community today who find some (or all) of these limi¬ 
tations unworkable, and have decided to move ahead with storing data in other formats. 
This reaction should be taken as a wakeup call to the FITS community to make changes 
in the FITS standard, or to see its usage fall. In this paper we detail some selected 
important problems which exist within the FITS standard today. It is not our intention 
to prescribe specific remedies to these issues; rather, we hope to call attention of the 


1 


2 


Thomas et al. 


FITS and greater astronomical computing communities to these issues in the hopes that 
it will spur action to address them. 


1. Introduction 


The Flexible Image Tran sport System standard (FITS: IWells et al.lll98ll:lHanisch et aL 
120011: iPence et all 1201 Oh has been a fundamental part of astrono mi cal computing for 
a significant part of the past 4 decades. FITS has provided the central means to store 
and exchange astronomical data and, because of hard work of the FITS community, it 
has become a relatively easy exercise for application writers, archivists and end user 
scientists to interchange data and work productively on many computational astronomy 
problems. 

While there have been significant changes, the FITS standard has evolved very 

slowly since its creation in the late 1970s. FITS has added new types of metadata con- _ 

ventions such as World Co ordinate System (WCS; jGreisen & Calabrettall2002l ; ICalabretta & Greisen 


20021 : [Greisen et al.ll2006h repres entation and data serializations such as variable length 


binary tables (ICotton et al. 19951) . Nevertheless, these changes have not been sufficient 
to match the greater evolution in astronomical research over the same period of time. 

Astronomical research now goes beyond the paradigm of the original scientific 
team consuming only the observational data for which they proposed. Astronomy re¬ 
searchers have shifted tow ards utilizing the observations of others, accessing data from 
remote archives over the Internet, and combining these data with the original observa¬ 
tions (or theoretical calculations) in order to obtain better and wider ranging scientific 
results. Many research projects now involve many diverse data sets from a range of 
sources and instruments in astronomy now produce many orders of magnitude larger 
datasets than were common at the time FITS was born. Additionally, astronomers 
have also come to increasingly rely on others to write software to help process and 
analyze their data. Common libraries, analysis environments, pipeline processed data, 
applications and services provided by third parties form a crucial foundation for many 
astronomers’ toolboxes. 

This evolution in research practice poses many new challenges for the 21 st century. 

The large volume of data, the shared softw are infrastructure, the distributed nature of 
the data holdings and the increasing complexity of the information we capture mandates 
that the data format used will enable the machine to do as much as possible handle the 
interchange, storage and processing of scientific information. 

Because FITS has shown difficulties in these areas some members of the commu¬ 
nity have gravitated away from FITS seeking more capable solutions. Other data for¬ 
mats s erving in this role inclu de the Starlink Hierarchical Data System (HDS: IDisnev & Wallace 
19821 : Econo mou et al.ll2014l) and the adoption of the Hierarchical Data Format ver¬ 
sion 5 (HDF5) by the Low-Frequency Array for radio astronomy project (LOFAR; 


Anderso n et ak 2011). We predict that the use of FITS will inevitably decline should it 


not adapt to these new challenges. 

In this paper we detail some selected important problems which exist within the 
FITS standard today. It is not our intention to prescribe specific remedies to these 
issues; rather, we hope to call attention of the FITS and greater astronomical computing 
communities to these issues in the hopes that it will spur action to address them. 
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2. Problems - Deficiencies of FITS for Modern Astronomical Computing 


There is not enough space in this paper to go into a detailed description of the de¬ 
ficiencies that we see are present in the current incarnation of FITS. Instead we will 
summarize the issues and present a more detailed examination in a subsequent paper. 


2.1. Lack of versioning, semantics and encodings 


There is no standard way to specify the version of a given FITS file or what extensions 
it supports. You must read the file and determine dynamically which extensions are 
present and whether they are understood. The “once FITS, forever FITS” maxim gives 
you some confidence that you can always read a FITS file but you cannot be sure if 
there is something that you do not understand. Explicit versioning of FITS files will 
help but there also needs to be a way to declare that a particular data model is being used 
from varia nts such as OIF TTS ( Thureau et al.ll2 006). MB FITS dMuders et al 12 006). and 
FITS-IDI (Greisen 2011), and to validate the contents against a namespaced schema. 

The allowed character set in FITS of 7-bit US-ASCII is overly restrictive in a 
Unicode world. It is unacceptable that FITS authors cannot use special scientific or 
mathematical symbols (e.g. a degree symbol) or capture non-English text in tables or 
FITS headers. 


2.2. Missing data models 


FITS can support basic data models such as tables and multi-dimensional images but 
lacks many higher level data models which enable scientific data description. To start 
with, there is no standardized way of associating the basic models in a related manner. 
Determining that a particular image extension contains the variance or mask for another 
image relies on string parsing and shared convention. 

Ironically, for a format designed to handle astronomy data, FITS lacks shared 
models which describe scientific errors or data quality. Archiving is a primary use case 
for FITS however it lacks a sufficiently rich model for capturing the history/provenance 
of the data. The HISTORY keyword provides just a textual representation of prove¬ 
nance which cannot be machine-read, and with a very loose meaning outside particular 
applications. 

Finally, the current FITS WCS data models are complex yet incomplete and in¬ 
flexible. There needs to be a way to st ack mappings in an arbitrary manner to allow 
for flexible model deve lopment (see e.g. Warren-Smith & Berrvlll998l:lBerrv & Jenness 
20121 : lHackeraPl2013h . 


2.3. Inflexibility in representing metadata and data 

The 80-character card image drives a number of subsequent limitations which result 
in poor metadata description (8-character keyword, 68-character limit in keyword val¬ 
ues, and cumbersome CONTINUE card constructs). This out-dated restriction also 
results in the awkward implementation of some conventions, such as ESO HIERARCH 
( Wicenec et al.[|2009b . that can not overcome the underlying limitations of representa¬ 
tion. Additionally, the lack of namespaces results in uncertainty over metadata meaning 
with other FITS tiles. Finally, the 2880 record is a minor but annoying restriction which 
results in wasteful blocks of whitespace in many FITS files, hampers the use of FITS 
to capture very small, but richly described data, and impedes the real-time writing of 
FITS files. 
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2.4. Inadequate support for large, distributed data 

Modem data sets can result in tiles of several terabytes that must be distributed across 
multiple file systems. The FITS grouping convention tries to provide this facility but 
is neither robust nor transparent enough. Additionally, streaming indeterminately sized 
data sets to files must be supported. 


3. Summary - Significant Problems exist in the FITS standard 

The problems which we have described are real and significant. We do not wish to 
recommend a particular solution here. Action to correct these issues should flow from 
constructive community discussion, and offered solutions to these problems. Possible 
solutions may involve moving existing FITS conventions into the core standard, mod¬ 
ification of the FITS standard to remove limitations or possibly transferring the FITS 
data model over into a new serialization, or some selection of these actions. 

These technical problems will be solved one way or another. If the community 
is not willing to do the hard work of hammering out a universal (or widely-adopted) 
approach, individual projects will continue to make their own ad-hoc solutions. Data 
formats will become increasingly fragmented and we will no longer enjoy the easy 
interoperability that FITS has provided for many years. 
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